Dominique Kellam
Jadon Calvert
Eastern Kentucky University
The primary objective of this project is to identify and visualize meaningful trends in US car accidents from 2016 to 2023. Our goal is to uncover when accidents occur most frequently, whether by time of day, day of the week, or month of the year, and determine if specific holidays are associated with increased accident rates. We also aim to explore geographic trends by identifying which states experience the highest and lowest number of accidents and assessing whether environmental factors such as weather, visibility, or road conditions contribute to accident severity. Additionally, we will examine long term trends in accident frequency to understand how they have changed over the years. By presenting our findings through a series of targeted visualizations, we hope to provide insights that could be valuable for public safety efforts, transportation planning, or future academic research.
This analysis uses the U.S. Accidents (2016–2023) dataset compiled by Sobhan Moosavi, which is publicly available on Kaggle. The dataset contains over 7.5 million records of traffic accidents that occurred in the United States between February 2016 and March 2023.
Each row in the dataset represents a single traffic accident and contains information collected from traffic cameras, sensors, police reports, and other public sources. The data includes: - Timestamp and location - Weather conditions - Traffic and visibility indicators - Accident severity (rated 1 to 4)
The following variables were selected or engineered for this project:
- Severity: Level of accident seriousness (1 = least
severe, 4 = most severe) - Start_Time: Timestamp of when
the accident began - Temperature(F),
Precipitation(in), Wind_Chill(F),
Visibility(mi): Weather-related variables -
Weather_Condition: Categorical weather label (e.g., Clear,
Rain, Fog) - State: Abbreviation of the U.S. state -
date_, hour, month,
day_of_week: Time-based features derived from
Start_Time - holiday_specific: Boolean
indicator for U.S. holidays (e.g., Memorial Day, Christmas)
These features were used to explore patterns in accident frequency and severity across time, weather, and holidays.
## ID Source Severity
## Length:7546771 Length:7546771 Min. :1.000
## Class :character Class :character 1st Qu.:2.000
## Mode :character Mode :character Median :2.000
## Mean :2.212
## 3rd Qu.:2.000
## Max. :4.000
##
## Start_Time End_Time
## Min. :2016-01-14 20:18:33.00 Min. :2016-02-08 06:37:08.00
## 1st Qu.:2018-11-20 16:22:02.00 1st Qu.:2018-11-20 17:22:44.50
## Median :2020-11-10 08:23:39.00 Median :2020-11-10 15:11:14.00
## Mean :2020-06-02 04:07:56.43 Mean :2020-06-02 11:34:12.32
## 3rd Qu.:2022-01-19 08:15:20.50 3rd Qu.:2022-01-19 19:01:21.00
## Max. :2023-03-31 23:30:00.00 Max. :2023-03-31 23:59:00.00
##
## Start_Lat Start_Lng End_Lat End_Lng
## Min. :24.55 Min. :-124.62 Min. :25 Min. :-125
## 1st Qu.:33.38 1st Qu.:-117.22 1st Qu.:33 1st Qu.:-118
## Median :35.80 Median : -87.81 Median :36 Median : -88
## Mean :36.19 Mean : -94.71 Mean :36 Mean : -96
## 3rd Qu.:40.11 3rd Qu.: -80.38 3rd Qu.:40 3rd Qu.: -80
## Max. :49.00 Max. : -67.11 Max. :49 Max. : -67
## NA's :3341777 NA's :3341777
## Distance(mi) Description Street City
## Min. : 0.000 Length:7546771 Length:7546771 Length:7546771
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.028 Mode :character Mode :character Mode :character
## Mean : 0.558
## 3rd Qu.: 0.460
## Max. :441.750
##
## County State Zipcode Country
## Length:7546771 Length:7546771 Length:7546771 Length:7546771
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Timezone Airport_Code Weather_Timestamp
## Length:7546771 Length:7546771 Min. :2016-01-14 19:51:00.00
## Class :character Class :character 1st Qu.:2018-11-20 16:15:00.00
## Mode :character Mode :character Median :2020-11-10 08:30:00.00
## Mean :2020-06-02 04:08:26.56
## 3rd Qu.:2022-01-19 07:58:00.00
## Max. :2023-03-31 23:53:00.00
##
## Temperature(F) Wind_Chill(F) Humidity(%) Pressure(in)
## Min. :-58.00 Min. :-80.0 Min. : 1.00 Min. : 0.00
## 1st Qu.: 49.00 1st Qu.: 43.0 1st Qu.: 48.00 1st Qu.:29.37
## Median : 64.00 Median : 62.0 Median : 67.00 Median :29.86
## Mean : 61.67 Mean : 58.3 Mean : 64.84 Mean :29.54
## 3rd Qu.: 76.00 3rd Qu.: 75.0 3rd Qu.: 84.00 3rd Qu.:30.03
## Max. :129.20 Max. :128.0 Max. :100.00 Max. :58.63
## NA's :1833858 NA's :10278 NA's :7904
## Visibility(mi) Wind_Direction Wind_Speed(mph) Precipitation(in)
## Min. : 0.00 Length:7546771 Min. : 0.0 Min. : 0.0
## 1st Qu.: 10.00 Class :character 1st Qu.: 4.6 1st Qu.: 0.0
## Median : 10.00 Mode :character Median : 7.0 Median : 0.0
## Mean : 9.09 Mean : 7.7 Mean : 0.0
## 3rd Qu.: 10.00 3rd Qu.: 10.4 3rd Qu.: 0.0
## Max. :140.00 Max. :1087.0 Max. :36.5
## NA's :39499 NA's :429617 NA's :2064937
## Weather_Condition Amenity Bump Crossing
## Length:7546771 Mode :logical Mode :logical Mode :logical
## Class :character FALSE:7453817 FALSE:7543317 FALSE:6688257
## Mode :character TRUE :92954 TRUE :3454 TRUE :858514
##
##
##
##
## Give_Way Junction No_Exit Railway
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:7511266 FALSE:6990004 FALSE:7527521 FALSE:7481985
## TRUE :35505 TRUE :556767 TRUE :19250 TRUE :64786
##
##
##
##
## Roundabout Station Stop Traffic_Calming
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:7546527 FALSE:7348460 FALSE:7337406 FALSE:7539342
## TRUE :244 TRUE :198311 TRUE :209365 TRUE :7429
##
##
##
##
## Traffic_Signal Turning_Loop Sunrise_Sunset Civil_Twilight
## Mode :logical Mode :logical Length:7546771 Length:7546771
## FALSE:6424853 FALSE:7546771 Class :character Class :character
## TRUE :1121918 Mode :character Mode :character
##
##
##
##
## Nautical_Twilight Astronomical_Twilight date_ year
## Length:7546771 Length:7546771 Min. :2016-01-14 Min. :2016
## Class :character Class :character 1st Qu.:2018-11-20 1st Qu.:2018
## Mode :character Mode :character Median :2020-11-10 Median :2020
## Mean :2020-06-01 Mean :2020
## 3rd Qu.:2022-01-19 3rd Qu.:2022
## Max. :2023-03-31 Max. :2023
##
## month hour precipitation any_precip
## Min. : 1.0 Min. : 0.00 Min. : 0.00000 Mode :logical
## 1st Qu.: 3.0 1st Qu.: 8.00 1st Qu.: 0.00000 FALSE:7016077
## Median : 7.0 Median :13.00 Median : 0.00000 TRUE :530694
## Mean : 6.7 Mean :12.33 Mean : 0.00613
## 3rd Qu.:10.0 3rd Qu.:17.00 3rd Qu.: 0.00000
## Max. :12.0 Max. :23.00 Max. :36.47000
##
## weather temperature visibility wind_chill
## Length:7546771 Min. :-58.00 Min. : 0.00 Min. :-80.0
## Class :character 1st Qu.: 49.00 1st Qu.: 10.00 1st Qu.: 43.0
## Mode :character Median : 64.00 Median : 10.00 Median : 62.0
## Mean : 61.67 Mean : 9.09 Mean : 58.3
## 3rd Qu.: 76.00 3rd Qu.: 10.00 3rd Qu.: 75.0
## Max. :129.20 Max. :140.00 Max. :128.0
## NA's :39499 NA's :1833858
## sevg state_name
## Length:7546771 Length:7546771
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
## [1] 7546771 58
## tibble [7,546,771 × 58] (S3: tbl_df/tbl/data.frame)
## $ ID : chr [1:7546771] "A-1" "A-2" "A-3" "A-4" ...
## $ Source : chr [1:7546771] "Source2" "Source2" "Source2" "Source2" ...
## $ Severity : num [1:7546771] 3 2 2 3 2 3 2 3 2 3 ...
## $ Start_Time : POSIXct[1:7546771], format: "2016-02-08 05:46:00" "2016-02-08 06:07:59" ...
## $ End_Time : POSIXct[1:7546771], format: "2016-02-08 11:00:00" "2016-02-08 06:37:59" ...
## $ Start_Lat : num [1:7546771] 39.9 39.9 39.1 39.7 39.6 ...
## $ Start_Lng : num [1:7546771] -84.1 -82.8 -84 -84.2 -84.2 ...
## $ End_Lat : num [1:7546771] NA NA NA NA NA NA NA NA NA NA ...
## $ End_Lng : num [1:7546771] NA NA NA NA NA NA NA NA NA NA ...
## $ Distance(mi) : num [1:7546771] 0.01 0.01 0.01 0.01 0.01 0.01 0 0.01 0 0.01 ...
## $ Description : chr [1:7546771] "Right lane blocked due to accident on I-70 Eastbound at Exit 41 OH-235 State Route 4." "Accident on Brice Rd at Tussing Rd. Expect delays." "Accident on OH-32 State Route 32 Westbound at Dela Palma Rd. Expect delays." "Accident on I-75 Southbound at Exits 52 52B US-35. Expect delays." ...
## $ Street : chr [1:7546771] "I-70 E" "Brice Rd" "State Route 32" "I-75 S" ...
## $ City : chr [1:7546771] "Dayton" "Reynoldsburg" "Williamsburg" "Dayton" ...
## $ County : chr [1:7546771] "Montgomery" "Franklin" "Clermont" "Montgomery" ...
## $ State : chr [1:7546771] "OH" "OH" "OH" "OH" ...
## $ Zipcode : chr [1:7546771] "45424" "43068-3402" "45176" "45417" ...
## $ Country : chr [1:7546771] "US" "US" "US" "US" ...
## $ Timezone : chr [1:7546771] "US/Eastern" "US/Eastern" "US/Eastern" "US/Eastern" ...
## $ Airport_Code : chr [1:7546771] "KFFO" "KCMH" "KI69" "KDAY" ...
## $ Weather_Timestamp : POSIXct[1:7546771], format: "2016-02-08 05:58:00" "2016-02-08 05:51:00" ...
## $ Temperature(F) : num [1:7546771] 36.9 37.9 36 35.1 36 37.9 34 34 33.3 37.4 ...
## $ Wind_Chill(F) : num [1:7546771] NA NA 33.3 31 33.3 35.5 31 31 NA 33.8 ...
## $ Humidity(%) : num [1:7546771] 91 100 100 96 89 97 100 100 99 100 ...
## $ Pressure(in) : num [1:7546771] 29.7 29.6 29.7 29.6 29.6 ...
## $ Visibility(mi) : num [1:7546771] 10 10 10 9 6 7 7 7 5 3 ...
## $ Wind_Direction : chr [1:7546771] "Calm" "Calm" "SW" "SW" ...
## $ Wind_Speed(mph) : num [1:7546771] NA NA 3.5 4.6 3.5 3.5 3.5 3.5 1.2 4.6 ...
## $ Precipitation(in) : num [1:7546771] 0.02 0 NA NA NA 0.03 NA NA NA 0.02 ...
## $ Weather_Condition : chr [1:7546771] "Light Rain" "Light Rain" "Overcast" "Mostly Cloudy" ...
## $ Amenity : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Bump : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Crossing : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Give_Way : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Junction : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ No_Exit : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Railway : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Roundabout : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Station : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Stop : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Traffic_Calming : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Traffic_Signal : logi [1:7546771] FALSE FALSE TRUE FALSE TRUE FALSE ...
## $ Turning_Loop : logi [1:7546771] FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Sunrise_Sunset : chr [1:7546771] "Night" "Night" "Night" "Night" ...
## $ Civil_Twilight : chr [1:7546771] "Night" "Night" "Night" "Day" ...
## $ Nautical_Twilight : chr [1:7546771] "Night" "Night" "Day" "Day" ...
## $ Astronomical_Twilight: chr [1:7546771] "Night" "Day" "Day" "Day" ...
## $ date_ : Date[1:7546771], format: "2016-02-08" "2016-02-08" ...
## $ year : num [1:7546771] 2016 2016 2016 2016 2016 ...
## $ month : num [1:7546771] 2 2 2 2 2 2 2 2 2 2 ...
## $ hour : int [1:7546771] 5 6 6 7 7 7 7 7 8 8 ...
## $ precipitation : num [1:7546771] 0.02 0 0 0 0 0.03 0 0 0 0.02 ...
## $ any_precip : logi [1:7546771] TRUE FALSE FALSE FALSE FALSE TRUE ...
## $ weather : chr [1:7546771] "Light Rain" "Light Rain" "Overcast" "Mostly Cloudy" ...
## $ temperature : num [1:7546771] 36.9 37.9 36 35.1 36 37.9 34 34 33.3 37.4 ...
## $ visibility : num [1:7546771] 10 10 10 9 6 7 7 7 5 3 ...
## $ wind_chill : num [1:7546771] NA NA 33.3 31 33.3 35.5 31 31 NA 33.8 ...
## $ sevg : chr [1:7546771] "more severe" "less severe" "less severe" "more severe" ...
## $ state_name : chr [1:7546771] "Ohio" "Ohio" "Ohio" "Ohio" ...
The raw dataset contains 7,728,394 observations (rows) of 46 variables (columns).
After data preparation and cleaning, the dataset contains 7,546,771 observations (rows) of 58 variables (columns).
| Severity | Number of Accidents |
|---|---|
| least severe | 66121 |
| less severe | 6010987 |
| more severe | 1272321 |
| most severe | 197342 |
The author defines severity as “the impact on traffic.” Low severity accidents would have a minimal effect on traffic whereas high severity accidents would have a significant impact on traffic.
We can observe that the majority of accidents that took place between 2016 and 2023 were categorized as “less severe,” accounting for 6,010,987 of the total 7,546,771 accidents.
The interactive time series plot shows daily accident counts across the United States from 2016 to 2023. The frequency of reported accidents increased noticeably after 2020, with peaks exceeding 10,000 accidents per day. This upward trend may reflect improved reporting mechanisms, changes in driving behavior, or broader shifts in traffic volume and weather conditions.
Among the top 10 most common weather conditions, “Overcast” and “Scattered Clouds” were associated with the highest average accident severity. In contrast, fair weather conditions such as “Fair” and “Fog” were linked to lower severity scores. This suggests that overcast or unstable weather may contribute to more serious traffic incidents.
The distribution of accidents by hour reveals two major peaks: one around 7–8 AM and another between 3–6 PM, corresponding to typical rush hour periods. Fewer accidents occur during the early morning hours, while activity steadily increases throughout the day and decreases again in the evening.
Accidents occurred most frequently on weekdays, with Friday showing the highest count, followed closely by Wednesday and Thursday. Sundays and Saturdays saw significantly fewer accidents. This pattern reflects increased commuting activity during the workweek compared to weekends.
December experienced the highest number of accidents, followed by January and November. Accident frequency was generally lower in the summer months, particularly July. This trend may reflect seasonal variations such as holiday travel, winter weather conditions, or changes in daylight and visibility.
| State | Accidents Per 100K |
|---|---|
| South Carolina | 6992.596 |
| California | 4351.219 |
| Oregon | 4167.566 |
| Florida | 3831.624 |
| Minnesota | 3300.101 |
We can observe that, when adjusted for population, the following states: South Carolina, California, Oregon, Florida, Minnesota, had the most accidents from 2016 to 2023.
| State | Average Accident Severity |
|---|---|
| Georgia | 2.507235 |
| Wisconsin | 2.473455 |
| Rhode Island | 2.459224 |
| Kentucky | 2.452863 |
| Colorado | 2.441580 |
While South Carolina had the most accidents per capita, the average severity was one of the lowest of all the states. The states that had the worst average severity were Georgia, Wisconsin, Rhode Island, Kentucky, and Colorado. While some states had a higher average severity than others, the largest difference in average severity was only 0.49.
When we visualize the average accident temperature by state, we can observe that generally, accidents in northern states occur more frequently in cooler temperatures, while accidents in southern states occur more frequently in warmer temperatures.
We can observe that for most states, there doesn’t seem to be a correlation between average temperature and number of accidents, but there are a few outliers. There is a slight positive correlation for South Dakota and a slight negative correlation for Wyoming.
We can observe that accidents tend to be less likely at each extreme. Very cold temperatures and very hot temperatures see the least number of accidents. The temperature range 40-50 sees slightly more accidents than average.
| Temperature Range | Average Severity |
|---|---|
| (10,20] | 2.936068 |
| (90,100] | 2.500000 |
| (30,40] | 2.427078 |
| (20,30] | 2.400340 |
| (70,80] | 2.315882 |
| (60,70] | 2.304957 |
| (50,60] | 2.295177 |
| (80,90] | 2.291958 |
| (40,50] | 2.263246 |
While fewer accidents occur at the temperature extremes, we can observe that the accidents that do occur are of a higher average severity. Accidents that occur when the temperature is between 10 and 20 degrees tend to have the highest severity.
The heatmap shows the correlation between quantitative features such as temperature, wind chill, visibility, precipitation, and severity. Temperature and wind chill were nearly perfectly correlated (\(r = 0.99\)), as expected. However, severity had only weak correlations with all other variables, suggesting that accident severity is influenced by additional factors beyond those measured here.
A one-way ANOVA was conducted to examine whether accident severity differs by weather condition. The results showed a statistically significant effect of weather on accident severity, \(F(4, 1,\!814,\!823) = 18,\!549\), \(p < .001\), indicating that the average severity of accidents varies across different weather conditions.
## Df Sum Sq Mean Sq F value Pr(>F)
## weather 4 18624 4656 18549 <2e-16 ***
## Residuals 1814823 455533 0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A Welch two-sample t-test was conducted to compare accident severity on specific holidays versus other days. The results showed a statistically significant difference in severity scores, \(t(93,\!469) = 2.50\), \(p = .0125\). The average severity on non-holidays (\(M = 2.212\)) was slightly higher than on holidays (\(M = 2.208\)), with a 95% confidence interval for the difference in means ranging from 0.0009 to 0.0073.
A Welch two-sample t-test was also conducted to examine differences in the average number of accidents per day on holidays versus non-holidays. The results were statistically significant, \(t(43.04) = 3.27\), \(p = .0021\). The mean number of accidents per day was higher on non-holidays (\(M = 2,\!947\)) compared to holidays (\(M = 2,\!173\)), with a 95% confidence interval for the difference in means ranging from 297 to 1,!250.
## [1] "T-test on Severity (Specific Holidays):"
##
## Welch Two Sample t-test
##
## data: Severity by holiday_specific
## t = 2.4975, df = 93469, p-value = 0.01251
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
## 0.0008838382 0.0073295171
## sample estimates:
## mean in group FALSE mean in group TRUE
## 2.212178 2.208071
## [1] "T-test on Frequency (Specific Holidays):"
##
## Welch Two Sample t-test
##
## data: n_acc by holiday_specific
## t = 3.2727, df = 43.041, p-value = 0.002105
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
## 296.8096 1249.9020
## sample estimates:
## mean in group FALSE mean in group TRUE
## 2946.832 2173.476
Although the difference is small, the chart shows a slightly higher average severity for accidents on non-holidays compared to holidays. The mean severity was 2.212 on non-holidays and 2.208 on holidays. The corresponding Welch t-test (\(t(93,\!469) = 2.50\), \(p = .0125\)) confirms that this difference is statistically significant, although not practically large. This suggests that while there are fewer accidents on holidays, they are not necessarily more or less severe.
The bar chart clearly shows that the average number of accidents per day is significantly lower on specific holidays compared to non-holiday dates. On average, there were around 2,173 accidents per day on holidays versus 2,947 on non-holidays. This visual supports the results of the Welch two-sample t-test (\(t(43.04) = 3.27\), \(p = .0021\)), confirming that this difference is statistically significant. The lower volume on holidays may reflect reduced traffic due to time off from work and school.
The results of this analysis confirm several intuitive but important insights into traffic accident patterns in the United States. Time-based trends clearly reveal that accidents peak during weekday rush hours and are far less frequent during early morning hours and weekends. December’s heightened accident volume suggests seasonal effects such as holiday travel and poor weather conditions play a significant role.
Environmental factors such as temperature and precipitation demonstrated limited but interesting relationships with accident severity and frequency. While accidents were least common during extreme temperatures, those that occurred under these conditions tended to be more severe. Weather conditions like “Overcast” and “Scattered Clouds” were associated with higher average severity, possibly reflecting poor visibility or driver overconfidence in seemingly stable weather.
Geographic trends uncovered population-adjusted hotspots for traffic incidents. States like South Carolina and Louisiana exhibited the highest accident rates per 100,000 residents, yet they were not among the worst in terms of average severity. This distinction could point to differences in reporting practices, infrastructure quality, or emergency response times across states.
Finally, our holiday-based t-tests revealed that although fewer accidents occur on holidays, those that do are not significantly more severe. This suggests that reduced traffic volume likely offsets any increased risk associated with holiday distractions or celebrations.
This exploratory data visualization and statistical analysis of U.S. traffic accidents from 2016 to 2023 provides a multi-dimensional view of when, where, and under what conditions accidents are most likely to occur.
The findings emphasize: - The importance of targeted safety efforts during weekday commute hours - Seasonal and weather-related influences on accident frequency and severity - Geographic disparities in accident rates that may warrant region-specific interventions - That holidays may not be inherently more dangerous, but still merit focused traffic safety messaging due to lower yet impactful accident rates
This project underscores the power of combining time-series analysis, geospatial mapping, and statistical testing to support data-driven transportation planning and public safety strategy. Future research could expand by incorporating traffic volume data, urban/rural context, or vehicle-specific information to deepen these insights.